Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data

نویسندگان

  • Stanislaw Osinski
  • Dawid Weiss
چکیده

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrowtopic document references and mix them into several multi-topic test sets for the algorithm. We then compare the clusters acquired from Lingo to the expected set of ODP categories mixed in the input. Finally we discuss observations from the experiment, highlighting the algorithm’s strengths and weaknesses and conclude with research directions for the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Test of BibTEX references

[1] J. Stefanowski and D. Weiss, “Carrot2 and language properties in web search results clustering,” in Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference, ser. Lecture Notes in Computer Science, E. M. Ruiz, J. Segovia, and P. S. Szczepaniak, Eds., vol. 2663. Madrid, Spain: Springer, 2003, pp. 240–249. [Online]. Available: http://www.cs.put.poznan.pl/dweiss/xml/ ...

متن کامل

Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm

Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining pr...

متن کامل

Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phras...

متن کامل

Ontology-Based File Naming Through Hierarchical Conceptual Clustering

Current directory-based hierarchical file systems have many limitations as the amount of unstructured data possessed by individual user is increasing continuously. One of the most significant problems is that users usually have difficulties searching, navigating, and organizing their files since useful semantic information describing a file is not used in the current directory-based system. To ...

متن کامل

Scuba Diver: Subspace Clustering of Web Search Results

Current search engines present their search results as a ranked list of Web pages. However, as the number of pages on the Web increases exponentially, so does the number of search results for any given query. We present a novel subspace clustering based algorithm to organize keyword search results by simultaneously clustering and identifying distinguishing terms for each cluster. Our system, na...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004